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Abstract 

We have performed multicanonical simulations of hydrophobic-hydrophilic heteropolymers with 
a simple effective, coarse-grained off-lattice model to study the structure and the topology of 
the energy surface. The multicanonical method samples the whole rugged energy landscape, in 



particular the low-energy part, and enables one to better understand the critical behaviors and 
visualize the folding pathways of the considered protein model. 
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I. INTRODUCTION 



Predicting the proteins structure and the folding mechanism is an important goal in struc- 
tural biology. Although the physical principles are known, the complexity of proteins as 
being macromolecules consisting of numerous atoms makes an accurate analysis of the fold- 
ing process of realistic proteins extremely difficult. The configuration space of proteins 
presents a complex energy profile consisting of tremendous number of local minima, barriers 
and further topological features. Because of energy barriers, the commonly used thermody- 
namic simulation techniques are not very efficient in sampling of a rugged landscape. The 
simulated molecule often remains in its starting wide microstate or move to a neighboring 
wide microstate, but in practise it will hardly reach the most stable one. The system may 
be trapped in a basin for a long time, which results in nonergodic behavior. On the other 
hand, the topography of the energy landscape, especially near the global minimum is of 
particular importance; this is because the potential energy surface defines the behavior of 
the system. Consequently, a visualization of the whole rugged landscape, covering the entire 
energy and temperature ranges, would be helpful in developing methods to allow one to 
survey the distribution of structures in conformational space and also gives the details of 
the folding pathway. Such a goal can be achieved with the multicanonical approach. 

The problem of protein folding entails the study of a nontrivial dynamics along pathways 
embedded in a rugged energy landscape. Therefore, one of the important aspects in this field 
is studying simple effective, coarse-grained models that allows a more global view on the 
relationship between the sequence of amino acid residues and the existence of a funnel-like 
structure along a pathway towards the energy minimum in a rugged free-energy landscape [l| ■ 

One of the most known examples is the HP model of lattice proteins, which has been 
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exhaustively investigated [2J, 13[. In this model, only two types of monomers are considered, 
with hydrophobic (H) and polar (P) character. Chains on the lattice are self-avoiding to 
account for the excluded volume. The only explicit interaction is between non-adjacent 
but next-neighbored hydrophobic monomers. This interaction of hydrophobic contacts is 
attractive to force the formation of a compact hydrophobic core which is screened from the 
(hypothetic) aqueous environment by the polar residues. Statistical mechanics aspects of 



this model are being subject of recent studies 



m. 
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Another off-lattice generalisation of the HP model is the AB model [7| , where the hydropho- 
bic monomers are labelled by A and the polar or hydrophilic ones by B. The contact 
interaction is replaced by a more realistic distance-dependent Lennard- Jones type of poten- 
tial accounting for short-range excluded volume repulsion and long-range interaction; the 
latter being attractive for AA and BB pairs and repulsive for AB pairs of monomers. An 
additional interaction accounts for the bending energy of any pair of successive bonds. This 
model was first applied in two dimensions Ml and generalized to three-dimensional AB pro- 
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teins |8|, |9J, with modifications by taking implicitly into account the additional torsional 
energy contributions of each bond. 



II. THE MODEL 

In this work, we study the effective off-lattice AB model of heteropolymers with N monomers. 
AB model as proposed in Ref. 8j has the energy function 

N-2 N-3 

E = -K X h k ■ h k+1 - k 2 b k ■ b k+2 + 

k=l k=l 
N-2 N , , 

i=l j=i+2 V »J 0' 

where h k is the bond vector between the monomers k and k + 1 with length unity. In Ref. 8| 
different values for the parameter set (ki, k?) were tested and finally set to (—1, 0.5) as this 
choice provide both the distributions for the angles between bond vectors and b^+i and 
the torsion angles between the surface vectors x b^+i and b^+i x h k+2 giving the best 
agreement with the distributions obtained for selected functional proteins. Since b^ • b^ +1 = 
cosi?fc, the choice K\ — — 1 makes the coupling between successive bonds "antiferromagnetic" . 
The second term in Eq. (TjQ) takes torsional interactions into account. The third term contains 
a pure Lennard- Jones potential, where the 1 / r% long-range interaction is attractive whatever 
types of monomers interact. The monomer-specific prefactor C(<Ji,<jj) only controls the 
depth of the Lennard- Jones valley: 

+ 1, <7j, (To = A, 

C{a h a j ) = { 1 (2) 

f-1/2, <Tj,(Tj = B or cr, f: Oy 



For technical reasons, we have introduced in both models a cut-off rjj = 0.5 for the Lennard- 
Jones potentials, below which the potential is repulsive hard-core (i.e., the potential is 
infinite) . 

For updating a conformation, as shown in Fig. [H the length of the bonds are fixed (|bfc| = 1, 
k = 1, . . . , N — 1). The (z+ l)th monomer lies on the surface of a unit sphere centered on the 
ith monomer. Therefore, spherical coordinates are the natural choice for calculating the new 
position of the (i + l)th monomer on this sphere. For the reason of efficiency, all the points 
on the sphere are not selected for updating, but restricted the choice to a spherical cap with 
maximum opening angle 2# max (the dark area in Fig. [1]). Thus, to change the position of 
the (i + l)th monomer to (i + 1)', we select the angles 9 and ip randomly from the respective 
intervals cos^ax < cos 6 < 1 and < (p < 2tt, which ensure a uniform distribution of the 
(i + l)th monomer's positions on the associated spherical cap. After updating the position of 
the (i + l)th monomer, the following monomers in the chain are simply translated according 
to the corresponding bond vectors which remained unchanged in this update. This is similar 
to single spin updates in local-update Monte Carlo simulations of the classical Heisenberg 
model with the difference that, in addition to local energy changes, long-range interactions 
of the monomers are to be computed anew after the update, due to changing their relative 
position. 



III. THE METHOD 

The multicanonical ensemble is based on a probability function in which the different energies 
are equally probable. However, implementation of the multicanonical algorithm (MUCA) is 
not straightforward because the density of states n(E) is a priori unknown. In practice, one 
only needs to know the weights lo, 

w(E) ~ l/n(E) = exp[(£ - F T{E) )/k B T(E)}, (3) 

and these weights are calculated in the first stage of simulation process by an iterative 
procedure in which the temperatures T(E) are built recursively together with the micro- 
canonical free energies F T ^/k,BT{E) up to an additive constant. The iterative procedure is 
followed by a long production run based on the fixed w's where equilibrium configurations 
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are sampled. Re-weighting techniques (see Ferrenberg and Swendsen [10|] and literature 
given in their second reference) enable one to obtain Boltzmann averages of various physical 
variables over a wide range of temperatures. 

As pointed out above, calculation of the a priori unknown MUCA weights is not trivial, 
requiring an experienced intervention. For lattice models, this problem was addressed in a 
sketchy way by Berg and Celik [11] and later by Berg 12J . An alternative way is to establish 



an automatic process by incorporating the statistical errors within the recursion procedure. 



IV. RESULTS AND DISCUSSIONS 

We first carried out canonical (i.e., constant T) MC simulations of six different 20-monomer 



sequences 



13| at temperatures T = 0.3, 0.6 and 2.4, as well as MUCA test runs to 
determine the required energy ranges for each sequence. The multicanonical weights were 
built after m = 500 recursions during a long single simulation, where the multicanonical 
parameters were iterated every 10000 sweeps. After having fixed the MUCA weight factors, 
a production run was carried out with 5 x 10 7 sweeps. 

Fig.[2]shows, as a sample, the histograms of the multicanonical run for the sequence AAAAB- 
BAAAABAABAAABBA. We can see from the figure that the whole temperature range, in- 
cluding the hard-to-reach low-energy region, is equally well sampled. Our simulations with 
these multicanonical parameters reveiled the lowest energy conformation for this sequence 
(our suspected GEM) at E = -59.105. 

Fig. [3] and Fig. [4] display the energy and the specific heat vs. temperature for the same 
sequence. The specific heat possess a peak about the temperature T = 0.28 , which corre- 
sponds to the transition point where the system undergoes a structural change from random 
coil to more compact globular state. 

We define an order parameter (OP) [l^ 

n F 

OP=l —X^\a {t) -a {RS) \ (4) 

90 n F ^ 1 1 1 1 ' 1 ] 

where np is the number of bond and torsion angles, al RS ^ and af are the bond and torsion 
angles of the considered configuration and of the choosen reference state, which is usually 



taken as the global energy minimum (GEM) state. The difference af — a[ RS ^ is always in 
the interval [—180°, 180°] , which in turn gives 

- f < < O > T < 1 (5) 

The order parameter may be considered as a measure of coinsedence of the considered 
conformation with the reference state. 

The free energy of the system was calculated by the formula [l5| 



F(T, OP) = E — TS(OP), (6) 
and the entropy in the above formula is, 

S{OP) = log H (OP) (7) 

where H(OP) is the histogram that the system has an order parameter value OP at a fixed 
temperature T. The multicanonical algorithm provides a goal sampling of the entire energy 
(temperature) range and enables one to determine the histograms H without problem. 

In Fig. [51 we show the free energy surface with respect to the other parameter OP and 
temperature T, evaluated by utilizig the multicanonical technique. The figure displays the 
free energy distribution of the created fifty million conformations of the choosen model 
protein with respect to the order parameter and temperature. Instead of potential energy, 
we have calculated and displayed the free energies in order to include the entropic effects, 
which are more meaningfull in considerations of the folding pathways, the free energy 
surface displays a valley structure and clearly pictures the existing funnel towards the state 
of global energy minimum. This topographic picture also allows one to realize at what 
temperature the system go through a structural change. Fig. |5l together with the contour 
map given in Fig. [6J enable one to visualize where the system under consideration leaves the 
more structural energy landscape and reaches to a funnel towards the lower energy states. 
This visualization is very helpfull from point of designing more effective generic simulation 
algoritms specific to the system under consideration. 
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V. CONCLUSIONS 



We have simulated different 20-monomer sequences of the off-lattice AB model proteins by 
utilizig multicanonical ensemble approach and investigated the free energy surface. The 
obtained picture serves as a usefull tool for visualization of the behaviour of considered 
system in the configuration space and provide helpfull information in designing more effective 
simulation algorithms. 
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FIG. 1: Spherical update of the bond vector between the ith and (i + l)th monomer. 
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FIG. 2: Histogram of multicanonical simulations at T = 2AK for the sequence AAAAB- 
B AAAAB AAB A A ABB A . 
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FIG. 3: The Boltzmann average energy as a function of temperature obtained from multicanonical 
simulation for the sequence AAAABBAAAABAABAAABBA. 
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FIG. 4: Specific heat obtained from multicanonical simulation as a function of temperature. 
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FIG. 5: Free energy surface in the configuration space of sequence AAAAB- 
B AAAAB AAB A A ABB A . 
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FIG. 6: Free energy surface contour in the configuration space of sequence AAAAB- 
BAAAABAABAAABBA . 



12 



